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Abstract. We survey facts mostly emerging from the seminal results of 
Alan Cobham obtained in the late sixties and early seventies. We do not 
attempt to be exhaustive but try instead to give some personal interpre- 
tations and some research directions. We discuss the notion of numera- 
tion systems, recognizable sets of integers and automatic sequences. We 
briefly sketch some results about transcendence related to the represen- 
tation of real numbers. We conclude with some applications to combina- 
torial game theory and verification of infinite-state systems and present 
a list of open problems. 



1 Introduction 

It is challenging to give a talk about the interactions existing between formal 
language theory and number theory. The topic is vast, has several entry points 
and many applications. To cite just a few: non-adjacent form (NAF) represen- 
tations to speed up computations arising in elliptic curve cryptography 
verification of infinite-state systems [53], combinatorial game theory, fractals 
and tilings [82120) . transcendence results, dynamical systems and ergodic theory 
[m Chap. 5-7], [13173] . For instance, there exist tight and fruitful links between 
properties sought for in dynamical systems and combinatorial properties of the 
corresponding words and languages. 

The aim of this paper is to briefly survey some results mostly emerging from 
the seminal papers of Cobham of the late sixties and early seventies [35136137] , 
while also trying to give some personal interpretations and some research direc- 
tions. We do not provide an exhaustive survey of the existing literature but we 
will give some pointers that we hope could be useful to the reader. 

When one considers such interactions, the main ingredient is definitely the 
notion of numeration system, which provides a bridge between a set of numbers 
(integers, real numbers or elements of some other algebraic structures [68|9j ) 
and formal language theory. On the one hand, arithmetic properties of numbers 
or sets of numbers are of interest and on the other hand, syntactical properties 
of the corresponding representations may be studied. To start with, we consider 
the familiar integer base k > 2 numeration system. Any integer n > is uniquely 
represented by a finite word (its fc-ary representation) repfe(n) = de ■ ■ ■ do over 



* Dedicated to the memory of my grandfather Georges Henderyckx 1930-2010. 



2 



the alphabet Ak — {0, . . . , fc — 1} such that X)i=o ^» ~ ^ ^^"^ '^^ ^- ^ote 
that imposing the uniqueness of the representation allows us to define a map 
repj. : N — > A^. Nevertheless, in many contexts it is useful to consider all the 
representations of an integer n over a given finite alphabet D d Z,, that is all 
the words q • • • cq G D* such that X]i=o '^i ~ ^'^^ instance, if w is the fc-ary 
representation of n and if the alphabet D is simply Ak, then for all j > 0, O^w 
is another representation of n. 

In the same way, any real number r G (0, 1) is represented by an infinite word 
did2 ■ ■ ■ over Ak such that X^itTi = Uniqueness of the representation 

may be obtained by taking the maximal word for the lexicographic ordering on 
A'j^ satisfying the latter equality; in this case, the sequence {di)i>i is not ulti- 
mately constant and equal to fc — 1: there is no N such that, for all n > N, 
dn = k — 1. Therefore, to represent a real number r > 0, we take separately 
its integer part [rj and its fractional part {r}. Base fc-complements or signed 
number representations [70' can be used to represent negative elements as well, 
the sign being determined by the most significant digit which is thus or fc — 1. 
By convention, the empty word e represents 0, i.e., repfc(O) = e. If the numera- 
tion system is fixed, say the base fc is given, then any integer n (resp. any real 
number r > 0) corresponds to a finite (resp. infinite) word over Ak (resp. over 
Ak U {*}, where ★ is a new symbol used as a separator). Therefore any set of 
numbers corresponds to a language of representations and we naturally seek for 
some link between the arithmetic properties of the numbers belonging to the set 
and the syntactical properties of the corresponding representations. Let X be a 
subset of N. Having in mind Chomsky's hierarchy, the set X could be consid- 
ered quite "simple" from an algorithmic point of view whenever the set of fc-ary 
representations of the elements in X is a regular (or rational) language accepted 
by a finite automaton. A set X C N satisfying this property is said to be fc- 
recognizable. Note that X is fc- recognizable if and only if 0*rep;,(X) is regular. 
As an example, a DFA (i.e., a deterministic finite automaton) accepting exactly 
the binary representations of the integers congruent to 3 (mod 4) is given in 
Figure [TJ Similarly, a set X C R of real numbers is k-recognizable if there exists 




Fig. 1. A finite automaton accepting 0* rep2(4N + 3). 

a finite (non-deterministic) Biichi automaton accepting all the fc-representations 
over Ak of the elements in X, that is, the representations starting with an arbi- 
trary number of leading zeroes, and in particular the ones ending with (fc — 1)". 
Such an automaton is often called a Real Number Automaton |25| . The Biichi 
automaton in Figure [2] (borrowed from a talk given by V. Bruyere) accepts all 
the possible binary encodings (using base 2-complement for negative numbers) 
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of elements in the set {2n + (0, 4/3) | n G Z}. For instance 3/2 is encoded by the 
language of infinite words 0+l*10'^U0+l*01'^. Note that the transition 3 — 6 
(resp. 2 — >■ 4) is considered for an odd (resp. even) integer part and the series 
J2t=i 4 * = 1/3 corresponds to the cycle {5, 6}. 

1 




Fig. 2. A Biichi automaton accepting {2n + (0,4/3) | n £ Z}. 

To generalize the fc-ary integer base system, it is quite natural to consider 
an increasing sequence of integers, like the Fibonacci sequence, as a numera- 
tion basis to get a greedy decomposition of any integer (see Definition [2|) or 
the negative powers of a real number /3 > 1 to develop any real r e (0,1) as 
X^itTi /3~* with the coefficients q belonging to a convenient finite alphabet. Let 
us point out Fraenkel's paper (54j for some general ideas about representations 
of integers in various numeration systems. Among non-standard decompositions 
of integers, let us mention the so-called combinatorial numeration system go- 
ing back to Lehmer and Katona, where integers are decomposed using binomial 
coefficients with some prescribed property, also see |33) . and the factorial nu- 
meration system j75]. In Frougny and Sakarovitch's chapter jl91 Chap. 2] and 
in Frougny's chapter [TSJ Chap. 7] many details on recognizable sets and about 
the representation of integers and real numbers are given. In particular. Parry's 
/3-developments of real numbers |80' are presented in the latter reference. Ab- 
stract numeration systems (see Definition ^ are a general framework to study 
recognizable sets of integers, see |7T] and [ini Chap. 3]. 

The seminal work of Cobham may be considered as a starting point for the 
study of recognizable sets for at least three reasons. Let us sketch these below. 
Details and definitions will be given in the next sections. 

(i) Cobham's theorem from 1969 [36] states that the recognizability of a set of 
integers, as introduced above, strongly depends on the choice of the base, 
e.g., there are sets which are 2-recognizable but not 3- recognizable. The 
only subsets of N that are recognizable in all bases are exactly the ultimately 
periodic sets, i.e., the finite unions of arithmetic progressions. See Theorem[T] 
in Section [3] below for the exact statement of the result. It is an easy exercise 
to show that an arithmetic progression is fc-recognizable for all fc > 2 (e.g.. 
Figure [T]) . See for instance [551 prologue] about the machine a diviser de 
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Blaise Pascal. In that direction, an interesting question [7 is to obtain the 
state complexity of the minimal automaton recognizing a given divisibility 
criterion in an integer base. For this state complexity question studied in the 
wider context of linear numeration systems (cf. Definition , see [3 2) . 

The base dependence of recognizability shown by Cobham's result strongly 
motivates the general study of recognizable sets and the introduction of 
non-standard or exotic numeration systems based on an increasing sequence 
satisfying a linear recurrence relation. 

For integer base k numeration systems, nice characterizations of recogniz- 
able sets are well-known: logical characterization by first order formulas in a 
suitable extension of the Presburger arithmetic (N, , fc-automatic charac- 
teristic sequence generated through a uniform morphism of length k, charac- 
terization in terms of algebraic formal power series for a prime base. See the 
excellent survey |,29j where the so-called Cobham-Semenov' theorem, which 
extends Cobham's original result from 1969 to subsets of N'^, d > 2, is also 
presented. Recall that the characteristic sequence {xi)i>Q € {0, 1}^ of X C N 
is defined by Xi = 1 if and only if i g X. It is not our goal to give here a full 
list of pointers to the existing bibliography on the ramifications of Cobham's 
theorem, see for instance [1^. For presentations of Cobham's theorem based 
on Georges Hansel's work, see j81|12| together with [84. . 

(ii) The next paper of Cobham from 1972 [37J introduced the concept of k- 
automatic sequences (originally called tag sequences, see Definition [5]) and 
linked numeration systems with the so-called substitutions and morphic 
words (see Definition d]). It is easy to see that a set X C N is fc-recognizable 
if and only if the characteristic sequence of X is a /c-automatic infinite word 
over {0,1}. For a comprehensive book on /c-automatic sequences, see [l2]. 
As we will see, this approach gives another way to generalize the notion of a 
recognizable set by considering sets having a morphic characteristic sequence 
(see Remark [2]). Details will be presented in Section [3l 

(iii) As the reader may already have noticed, this survey is mainly oriented to- 
wards sets of numbers (integers) giving rise to a language of representations. 
Another direction should be to consider a single real number and the in- 
finite word representing it in a given base. Alan Cobham also conjectured 
the following result proved later on by Adamczewski and Bugeaud. Let a be 
an algebraic irrational real number. Then the base-k expansion of a cannot 
be generated by a finite automaton. Cobham's question follows a question of 
Hartmanis and Stearns [64] : does it exist an algebraic irrational number com- 
putable in linear time by a (multi-tape) Turing machine? In the same way, 
if an infinite word w over the finite alphabet Ak of digits has some specific 
combinatorial properties (like, a low factor complexity, or being morphic or 
substitutive), is the corresponding real number having w as fc-ary represen- 
tation transcendental? Let us mention that several surveys in that direction 
are worth of reading [77l Chap. 10], [H Chap. 8], [212]. We will briefly 
sketch some of these developments in Section [H 
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In Section [5l we sketch some of the Hnks existing between numeration sys- 
tems, combinatorics on words and combinatorial game theory. Cobham's theo- 
rem about base dependence also appears in the framework of the verification of 
infinite state systems, see Section [B] Finally, in Section [7] we give some paths to 
the literature that the interested reader may follow and in Section |8l we present 
some open questions. 

2 Cobham's Theorem and Base Dependence 

Two integers k,£ > 2 are multiplicatively independent if the only integers m, n 
such that fc™ = ^" are m = n — 0. Otherwise stated, k,£> 2 are multiplicatively 
independent if and only if log A;/ log £ is irrational. Recall a classical result in 
elementary number theory, known as Kronecker's theorem: if 6 > is irrational, 
then the set {{n9} \ n > 0} is dense in [0,1]. Such result is an important 
argument in the proof of the following result. 

Theorem 1 (Cobham's theorem |36] ). Let k,£ > 2 be two multiplicatively 
independent integers. A set X C N is simultaneously k -recognizable and i- 
recognizable if and only if X is ultimately periodic. 

Obviously the set P2 = {2" | n > 1} of powers of two is 2-recognizable 
because rep2(P2) — 10*. But since P2 is not ultimately periodic, Cobham's 
theorem implies that P2 cannot be 3-recognizable. To see that a given infinite 
ordered set X = {xo < Xi < X2 < • • • } is fc-recognizable for no base k > 2 
at all, we can use results like the following one where the behavior of the ratio 
{resp. difference) of any two consecutive elements in X is studied through the 
quantities 

Rx := limsup —^-^ and Dx := limsup {xi+i — Xi). 

Theorem 2 (Gap theorem [37] ) ■ Let k > 2. If X C N is a k -recognizable 
infinite subset ofN, then either Rx > 1 or Dx < +00. 

Corollary 1. Let a G N>2. The set of primes and the set {n° | n > 0} are 

never k -recognizable for any integer base k > 2. 

Proofs of the Gap theorem and its corollary can also be found in [50] . It is 
easy to show that X C N is fc-recognizable if and only if it is fc"-recognizable, 
n e N \ {0}. As a consequence of Cobham's theorem, sets of integers can be 
classified into three categories: 

— ultimately periodic sets which are recognizable for all bases, 

— sets which are fc-recognizable for some fc > 2, and which are ^-recognizable 
only for those i> 2 such that fc and £ are multiplicatively dependent bases, 
for example, the set P2 of powers of two, 

— sets which are fc-recognizable for no base fc > 2 at all, for example, the set 
of squares. 
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Definition 1. An infinite ordered set X — {xq < xi < X2 < ■ ■ ■} such that 
Dx < +00 is said to be syndetic or with bounded gaps. Otherwise stated, X is 
syndetic if there exists C > such that, for all n > 0, x„+i ~ Xn < C . 

If X C N is ultimately periodic, then X is syndetic. Note that the converse 
does not hold. For instance, consider the complement of the set {2" | n > 0} 
which is syndetic, 2-recognizable but not ultimately periodic. 

Example 1 (Thue-Morse set). Let n € N. Denote by S}~{n) the classical number- 
theoretic function summing up the digits appearing in rep;.(n). As a classical ex- 
ample, consider the set T = {n € N | S2{n) = mod 2}. This set is 2-recognizable 
and syndetic but not ultimately periodic. It appears in several contexts [llj and 
in particular, it provides a solution to Prouhet's problem (also known as the 
Prouhet-Tarry-Escott problem which is a special case of a multi-grade equa- 
tion). 

The set of squares is fc-recognizable for no integer base k but as we shall 
see this set is recognizable for some non-standard numeration systems (see Ex- 
ample 111 . One possible extension of fc-ary numeration systems is to consider a 
numeration basis. 

Definition 2. A numeration basis is an increasing sequence U — {Un)n>Q of 
integers such that Uq = 1 and supj>o Ui+i/Ui is bounded. 

Using the greedy algorithm, any integer n > has a unique decomposition 



where the coefficients q belong to the finite set Ajj — {0, . . . , sup\Ui+i/Ui \ — 1}. 
Indeed there exists a unique ^ > such that Ui < n < Ui+i. Set r^ = n. 
For all i = proceed to the Euclidean division — CiUi + ri_i, with 

ri_i < Ui. The word q • • • Cq is the (normal) U -representation of n and is denoted 
by repj/(n). Naturally, these non-standard numeration systems include the usual 
integer base k system by taking [/„ = fc" for all n > 0. The numerical value map 
valij : A^ N maps any word de ■ ■ ■ do over Ajj onto X]i=o '^i^i- 

Remark 1. By contrast with abstract numeration systems that will be intro- 
duced later on, when dealing with a numeration basis we often use the terminol- 
ogy of a positional numeration system to emphasize the fact that a digit d e Ajj 
in the ith position (counting from the right, i.e., considering the least significant 
digit first) of a {/-representation has a weight d multiplied by the corresponding 
element Ui of the basis. 

Having in mind the notion of A;-recognizable sets, a set X C N is said to be 
U -recognizable if ^:ep^{X) = {iepu{n) \ n G X} is a regular language over the 
alphabet Au. Note that rep[/(X) is regular if and only if 0* iepjj{X) is regular. 
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Definition 3. A numeration basis U — {Un)n>o is said to he linear, if there 
exist ao, . . . , afc_i e Z such that 

\fn > 0, Un+k = Ok-lUn+k-l H ^aoUn- (1) 

If Imin^oc Un+i/Un = (3 for some real /3 > 1, then U is said to satisfy the 
dominant root condition and l3 is called the dominant root of the recurrence. 

If N is [/-recognizable, then [/ is a linear numeration basis (89|19) (hint: 
observe that rep[/({C/„ |n>0}) = 10*). For a discussion on sufficient conditions 
on the recurrence relation satisfied by U for the [/-recogniz ability of N, see [55] 
and |75) . In particular, as shown by the next result, for a linear numeration 
basis U , the set N is U -recognizable if and only if all ultimately periodic sets are 
U -recognizable. 

Theorem 3 (Folklore |19] ). Let p,r >0. If U ~ {Un)n>o is a linear numer- 
ation basis, then 

YsXjj^{pn + r) = |q • • • Co e I ^ Cfc Uk e pN + r| 

fc=0 ^ 

is accepted by a DFA that can be effectively constructed. In particular, if N is 
U -recognizable, then any ultimately periodic set is U -recognizable. 

Example 2. Consider the Fibonacci numeration system given by the basis Fq = 
1, Fi = 2 and Fn+2 = Fn+i + Fn for ah n > 0. For this system, 0*repp.(N) is 
given by the set of words over {0, 1} avoiding the factor 11 and the set of even 
numbers is [/-recognizable [S^ using the DFA shown in Figure [S 




Fig. 3. A finite automaton accepting 0* rep^(2N). 



To conclude this section, we present a linear numeration basis U such that 
the set of squares Q = {n^ \ n G N} is [/-recognizable. This set will also be used 
in Example [3] to get a set having a morphic characteristic sequence. 

Example 3. Consider the sequence given [/„ = (n -f 1)^ for all n > 0. This 
sequence satisfies, for all n > 0, the relation [/„+3 = 3 Un+2 — 3 Un+i + Un. In 
that case, rep[;(N) n 10*10* = {lO'^lO'' \b'^ <2a + A} showing with the pumping 
lemma that N is not [/-recognizable [89]. But trivially, we have rep^(Q) — 10*. 
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3 Substitutions and Abstract Numeration Systems 

For basic facts on morphisms over A* or the usual distance put on A'^ (which 
gives a notion of convergence), see classic textbooks like |12I19I76] . Let ^ be a 
finite alphabet and a : A* ^ A* he a, morphism. If there exist a letter a G A and 
a word u G A'^ such that a{a) — au and moreover, if lim„^+oo |(T"(a)| = +00, 
then a is said to be prolongable on a. 

Definition 4. Let a : A* ^ A* be a morphism prolongable on a. We have 

a{a)^au, fT^(a) = a m (t(m), a^{a) = au(T{u) (t'^{u), ... . 

Since, for all n E N, cr"{a) is a prefix of a"^^{a) and because |cr"(a)| tends to 
infinity when n — !■ +00, the sequence (cr"(a))n>o converges to an infinite word 
denoted by cr"(a) and given by 

cr'^(a) lim a" {a) ^ aua{u) cr'^{u) cr^{u) ■ ■ ■ . 

This infinite word is a fixed point of a. An infinite word obtained in this way by 
iterating a prolongable morphism is said to be purely morphic. In the literature, 
one also finds the term pure morphic. If x G A^ is purely morphic and if r : 
A ^ B is a coding (or letter-to-letter morphism), then the word y — t{x) is said 
to be morphic. 

Definition 5. Let k >2. A morphic word w e B'^ is fc-automatic if there exists 
a morphism a : A* ^ A* and a coding r such that w — r(cr'^(a)) and, for 
all c € A, |(t(c)| = k. a morphism satisfying this latter property is said to be 
uniform. 

The link between fc-recognizable sets and fc-automatic sequences is given by 
the following result. In particular, in the proof of this result, it is interesting to 
note that an automaton is canonically associated with a morphism. 

Theorem 4. fST^I An infinite word w = wqWiW2 ■ ■ ■ over an alphabet A is It- 
automatic if and only if, for all a A, the set Xa = {i € N \ Wi = a} is 
k -recognizable. 

Otherwise stated, w — wqWiW2 ■ ■ ■ £ A^ is fc-automatic if and only if there 
exists a deterministic finite automaton with output (DFAO) M where Q is the 
set of states oi M., 5 : Q x Ak Q (resp. r : Q — >■ A) is the transition function 
(resp. output function) of M, such that T(5(go, repj.(ri)) = Wn for all n > 0. 

Remark 2. Using automata as a model of computation, U -recognizable sets nat- 
urally raise some interest. On the same level, sets of integers having a morphic 
characteristic sequence can be considered as another natural generalization of 
the concept of fc-recognizability. Iterations of a morphism may be used to get 
inductively further elements of the set defined by the morphism and a coding. 
As will be shown by Theorem [Bl similarly to the case of uniform morphisms (as 
given in Definition [S]) described above, the computation of a given element can 
also be done by using a DFAO and representations of integers in an abstract 
numeration system. 
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Example 4- Consider the alphabet A = {a, b, c} and the morphism a : A* ^ A* 
defined by cr : a n- abcc, b n- 6cc, c i— >■ c. We get 



It is easy to see that considering the coding r : a, 6 i— ^ 1 and r : c i~> 0, the word 
r((T'^(a)) is the characteristic sequence of the set of squares. 

The factor complexity of an infinite word w is the non-decreasing function 
Pu, : N N mapping n onto the number of distinct factors (or subwords) 
occurring in w. See for instance |19[ Chap. 4]. For a survey on the factor com- 
plexity of morphic words, see [8]. In 1972, Cobham already observed that if w 
is fc-automatic, then p^ is in 0{n). For instance, the factor complexity of the 
characteristic sequence of the Thue-Morse set T considered in Example [1] is 
computed in [2f 39l. 

Theorem 5 (Morse Hedlund's Theorem). Let x — xoXiX2 ■ ■ ■ be an infi- 
nite word over A. The following conditions are equivalent. 

— The complexity function px is bounded by a constant, i.e., there exists C such 
that for all n G N, Px(n) < C . 

— There exists No S N such that for all n > Nq, Px{n) ~ Px{Nq). 

— There exists Nq G N such that PxiN^) = Nq. 

— There exists m G N such that px{m) — px{m + 1). 

— The word x is ultimately periodic. 

In particular, non ultimately periodic sequences with low complexity are the so- 
called Sturmian sequences whose factor complexity is p{n) — n + 1 for all n > 1. 
Note that such sequences are over a binary alphabet, p{l) = 2. For a survey 
on Sturmian words, see for instance flW. Since Pansiot's work [79], the factor 
complexity of a non ultimately periodic purely morphic word w is well-known, 
see for instance [lH Chap. 4] or the survey [8], there exists constants Ci, C2 such 
that Cifin) <Pw{n) < C2f{n) where f{n) S {n, n logn, nloglogn, n^}. 

Remark 3. F. Durand has achieved a lot of work towards a general version of 
Cobham's theorem for morphic words |45I46I47] . Without giving much details 
(see for instance [48] for a detailed account), with a non-erasing morphism a over 
A = {ai, . . . ,at} (i.e., u((Ji) 7^ e for all i) generating a morphic word w (also 
using an extra coding) is associated a matrix M(j (like the adjacency matrix of 
a graph) where, for all i, j, {yia)i,j is the number of occurrences of the letter ai 
in the image cr(aj). Considering the morphism in Example we get 



a'^ (a) = abccbccccbccccccbccccccccbccccccccccbcc ■ ■ ■ . 




Then considering the irreducible components (i.e., the strongly connected com- 
ponents of the associated automaton) of the matrix Mo- and the theorem of 



10 



Perron-Frobenius, a real number /3 > is associated with the niorphism. The 
word w is therefore said to be P -substitutive. Let a, /3 > 1 be two multiphca- 
tively independent Perron numbers (the notion of muhiphcative independence 
extends to real numbers > 1). Under some mild assumptions ,48 , if w is both 
a- substitutive and (3 -substitutive, then it is ultimately periodic. It is a natural 
generalization of the fact that if fc,£ > 2 are multiplicatively independent, then 
a word which is both fc-automatic and ^-automatic is ultimately periodic. 

Example 5. The consecrated Fibonacci word, i.e., the unique fixed point of 
(T : 01, 1 i-^- 0, is a-substitutive where a is the Golden ratio (1 + -\/5)/2. 
Therefore, this infinite word is fc-automatic for no integer k > 2. Indeed, k and 
the Golden ratio are multiplicatively independent. 

In view of Theorem [31 it is desirable for a numeration basis U that the set N 
be [/-recognizable. In that case, one can use a finite automaton to test whether 
or not a given word over Ajj is a valid [/-representation. Taking this requirement 
as a basic assumption and observing that for all integers x,y, we have a; < y if 
and only if rep(j (x) is genealogically less than rep^; (y) , we introduce the concept 
of an abstract numeration system. To define the genealogical order (also called 
radix or military order) , first order words by increasing length and for words of 
the same length, take the usual lexicographical order induced by the ordering of 
the alphabet. 

Definition 6. Let L be an infinite regular language over a totally ordered alpha- 
bet {A, <). An abstract numeration system is the triple S — (L, A, <). Ordering 
by increasing genealogical order the words in L provides a one-to-one correspon- 
dence between L and N. The nth word in L (starting from 0) is denoted by 
repg(n) and the inverse map vals : L — > N is such that val5(rep5(n)) = n. 
Any numeration basis U such that N is U -recognizable is a particular case of an 
abstract numeration system. In this respect, a set X C N is S'-recognizable, if 
Tepg{X) is a regular language. 

A sequence w = wqWi • • • is S'-automatic if there exists a DFAO A4 where 
S : Q X Ak ^ Q (resp. t : Q A) is the transition function (resp. output 
function) of M, such that T{5{qa^ve]yg{n)) — Wn for all n>0. 

Example 6. Again the set of squares Q is S'-recognizable for the abstract numer- 
ation system 5* = [a*b* U a*c*, {a, 5, c}, a < 6 < c). Indeed, we have 

a*b* U a*c* — e, a, 6, c, aa, ab, ac, bb, cc, aaa, . . . 

and one can check that rep5(Q) = a* because the growth function of the lan- 
guage is #((a*6* U a*c*) n {a, b, c}") = 2n + 1. 

Theorem |4] can be generalized as follows ^83| or fT9', Ch. 3] . 

Theorem 6. An infinite word w — wqWiW2 ■ ■ ■ over an alphabet A is morphic 
if and only if there exists an abstract numeration system S such that w is S- 
automatic. 
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Note that for generalization of Theorems |4] and [6] to a muhidimensional set- 
ting, see Salon's work |86|87j and [31] respectively. Moreover, thanks to the above 
result, Durand's work can also to some extent be expressed in terms of abstract 
numeration systems. Observe that in ExamplelHlthe abstract numeration system 
is based on a regular language having a polynomial growth. This corresponds 
to the case where the dominating eigenvalue of the matrix associated with the 
morphism is 1. Such a situation (polynomial versus exponential growth) is con- 
sidered in |49]. Indeed, note that in the discussion about a morphic version of 
Durand-Cobham's theorem in Remark [3] we only considered morphisms with 
exponential growth, i.e., the dominating eigenvalue being > 1. 

4 Transcendental Numbers 

This short section is based on a lecture given by B. Adamczewski during the 
last CANT summer school in Liege [HI Chap. 8] and on [92] • We also refer the 
reader to [2] . It illustrates one of the strong links existing between combinatorics 
on words and number theory. For a survey on combinatorics on words, see for 
instance [17134] . Recall that a complex number which is a root of a non-zero 
polynomial with rational (or equivalently integer) coefhcients is said to be alge- 
braic. Otherwise, it is said to be transcendental. Since Borel's work, one thinks 
that base-fc expansion of algebraic irrational numbers are "complex" and not 
much is known about their properties. 

With any infinite word w — wiW2 ■ ■ ■ over the alphabet of digits Ak = 
{0, . . . , fc — 1} is associated the real number J^t^ '^^ ^l- Clearly, a real 

number a is algebraic (over Q) if and only if, for all z G Z, a + z is algebraic. 
Indeed, if a is a root of the polynomial P{X) G Q(^), then a + z is a root of 
P{X — z) £ Q{X). Hence, we can restrict ourselves to numbers in (0, 1). 

Transcendence of a number whose binary expansion is Sturmian has been 
proved in 1997 [FT] . 

Example 7. Consider again the Fibonacci word / = f 1/2/3 ■ ■ ■ — 010010 • • • . The 
real number J2t=i fi transcendental. 

Let e N \ {0, 1}. The factor complexity of the fc-ary expansion w of every 
irrational algebraic number satisfies 



The main tool is a p-adic version of the Thue-Siegel-Roth theorem due to Rid- 
out. 

A combinatorial transcendence criterion obtained in [3] using Schmidt's sub- 
space theorem [88] is used to obtain the following result. 

Theorem 7 (Adamczewski and Bugeaud [3j). Let k e N\{0, 1}. The factor 
complexity of the k-ary expansion w of a real irrational algebraic number satisfies 



lim inf. 



n 



00 



{Pw{n) - n) = +00. 



lim 

n— >+oo 



Pw{n) 



+00. 



n 
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Let A: > 2 be an integer. If a is a real irrational number whose fc-ary expansion 
has factor complexity in 0{n), then a is transcendental. Since, it is well-known 
[37j that automatic sequences have factor complexity e 0{n), we can deduce 
that if a real irrational number has an automatic fc-ary expansion, then it is 
transcendental. 

Theorem 8 (Bugeaud and Evertse |30j ) . Let k > 2 be an integer and £, be 
a real irrational algebraic number with < ^ < 1. Then for any real number 
T] < 1/11, the factor complexity p{n) of the k-ary expansion of satisfies 

Imi — — = +00. 

n-^+oo rt(log n)^ 

In [S] , it is shown that the binary expansion of an algebraic number contains 
infinitely many occurrences of 7/3-powers. Hence the binary expansion of an 
algebraic number contains infinitely many overlaps. 

5 Combinatorial Game Theory 

Numeration systems, number theory and therefore formal language theory also 
have interesting connections with combinatorial game theory. In classical text- 
books like |63I15| allusion to the game of Nim is made. See |16l57j for background 
on two player combinatorial games: no chance, no hidden information, same op- 
tions for the two players who play alternatively, .... In particular, in removal 
games, we are looking for a winning strategy which allows a player to consum- 
mate a win regardless of the moves chosen by the other player. If such a strategy 
exists for given initial conditions, it is therefore natural to ask about the algo- 
rithmic complexity of the computation of the winning strategy. A first question 
to answer is to determine the status A/" or of a given position [57j . 

A N -position, or winning position, is a position for which a winning strat- 
egy exists for the player who starts. A V -position is a position for which all 
options lead to a A/'-position, and is thus winning for the second playei0. In the 
game of Nim played on two piles of tokens, two players play alternatively and 
remove a positive number of tokens from one of the piles. The player remov- 
ing the last token win. Otherwise stated, the first player unable to move loses 
(normal condition). In [T^], connections between the game of Nim (values of 
the Spr ague- Grundy function) and the notion of 2-regular functions in the sense 
of AUouche and Shallit is observed (finiteness of the 2-kernel). In the famous 
WythojJ's game, an extra move is allowed: removing the same positive number 
of tokens on both piles. The game of Nim can be easily generalized to n piles 
of tokens contrarily to Wythoff's game where extensions have been presented 
but no suitable generalization is known: for the 7^-positions playing with an odd 

^ In the game graph Q where vertices are positions and directed edges are the allowed 
moves, the set of 'P-positions is the kernel of Q: there is no move between any two 
"P-positions and from any A/'-position, there exists a move to a P-position. 
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number of piles is similar to the game of Nim and playing with an even number 
of piles is hard [44|43|53|52) . In the last reference, Wythoff's game is considered 
as a Prime game. Informally, a game whose generalization to more than one or 
two piles seems to be very hard. 

For instance, A. Fraenkel makes great use of various numeration systems to 
get characterizations of T'-positions [35] ■ As an example, in Wythoff's game, a 
position (x, y) is a T'-position if and only if the J^- representation iepp{x) ends 
with an even number of zeroes and rep^(j/) — rep^(a;)0 is the left shift of the first 
component, where F is the numeration basis given by the Fibonacci sequence 
from Example [2] [53 . Similarly, (x, y) is a T'-position if there exists n such that 
(x, y) — {\na\, \ na^\) where a is the Golden ratio. So complementary Beatty 
sequences also enter the picture of combinatorial games |42l53l38j . 

In [H] moves that can be adjoined without changing the set of T'-positions 
are characterized using the formalism of morphisms and the fact that the com- 
putation of the successor function in the Fibonacci system is realized by a finite 
transducer [59]. Let a € N \ {0, 1}. In the parameterized version of Wythoff's 
game where a player can remove k tokens from one pile and I from the other 
[53] . with the condition — £| < a, the Ostrowski numeration system [18] based 
on the convergents of a continued fraction is used. 

It is interesting to note that obviously the 7^-positions of Wythoff's game 
are also characterized by the Fibonacci word introduced in Example [71 The nth 
■p-position is given by the pair of indices of the nth symbol and nth symbol 
1 occurring in the Fibonacci word. This simple observation relates combina- 
torial properties of morphic words like the Fibonacci or Tribonacci words to 
characterizations of 7^-positions of games |44|43|52] . Morphic characterizations 
of 'P-positions seems to recently raise some interest among combinatorial game 
theorists [52l . 



6 Applications for Verification of Infinite State Systems 

Sets of numbers recognized by finite automata arise when analyzing systems 
with unbounded mixed variables taking integer or real values. Therefore are 
considered systems such as timed or hybrid automata [21] . One needs to develop 
data structures representing sets manipulated during the exploration of infinite 
state systems. For instance, it is often needed to compute the set of reachable 
configurations of such a system. 

Let A: > 2 be an integer. Recall that A set AT C M is k- recognizable if there 
exists a Biichi automaton accepting all the /c-representations of the elements 
in X . This notion extends to subsets of and to Real Vector Automata or 
RVA. Also Biichi-Bruyere's Theorem giving a first order logical characterization 
of fc-recognizable sets of integers holds in this context of real numbers for a 
suitable structure (M, Z, -|-, 0, <, 14), see [53]. Roughly speaking definability in 
(R, Z, -t-, 0, <) of subsets of R'^ is the natural extension of ultimately periodicity 
of subsets in N. 
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Theorem 9. j24 l // a subset X C M'' is definable by a first-order formula in 
(M,Z, +,0, <), then X written in base k > 2 is recognizable by a weak determin- 
istic RVA A. 

Weakness means that each strongly connected component of A contains only 
accepting states or non-accepting states. 

Theorem 10. [22) Let k,£ > 2 be two multiplicatively independent integers. If 
X C W is both k- and £ -recognizable by two weak deterministic RVA, then it is 
definable in (R, Z, +, 0, <) . 

The extension of Cobham-Semenov's theorem for subsets of K'^ in this set- 
ting is discussed in |23]. The case of two coprime bases was first considered in 
|22| . Though written in a completely different language, a similar result was in- 
dependently obtained in [1] . This latter paper is motivated by the study of some 
fractal sets. 

Remark 4- Weak deterministic RVA have a particular interest from an algorith- 
mic point of view. They recognize languages that are recognizable by determin- 
istic Biichi and deterministic co-Biichi automata. For instance, minimization 
algorithms in O{n\ogn) exist for this class |74j . 

7 Abridged Bibliographic Notes 

With a gentle introduction to the logical formalism, a good way to start with 
integer base numeration systems is to consider ^29^. Each time I come back to 
this very well written survey, I learn something new. Then, it is a good idea to 
move to the "state of the art" linear numeration basis where the characteristic 
polynomial of the recurrence is the minimal polynomial of a Pisot number |28| . 
In parallel, one should consider Frougny's chapter [76l Chap. 7] and her very 
interesting work on the normalization map j58| and beta-expansions |60| . As a 
good textbook on some of the aspects presented here, consider |12j. The original 
paper of Cobham [37] is also worth of reading. For some general surveys on 
factor complexity and the Thue-Morse word, without any required background, 
see [8TT1] . 

Then I cannot resist advertising [19] where in the spirit of Lothaire's series, 
we try to present the fruitful links existing between combinatorics on words, 
automata theory and number theory. It presents in a self-contained expository 
book much more material than is presented in this survey (ergodic theory, Rauzy 
fractal, joint spectral radius,. . . ). 

For a list of pointers on Cobham's theorem in various contexts, see [48] for 
an updated survey. Accounts of Perron-Frobenius theory can be found in many 
classical textbooks, but probably [73] is worth reading. 

Connections between symbolic dynamics and formal language theory are 
fruitful: for the reader with no background in dynamics (for instance, no knowl- 
edge in measure theory is required) and on a very introductory level, consider 
[90] . Then, move to the survey [13] and [82] . 
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8 Some Open Problems 

We conclude with some general (and probably quite hard) open problems. 

— As mentioned in Section [31 the most general version of Cobham's theorem 
still relies on some mild assumptions about the considered morphisms (de- 
tails are not given in this survey). F. Durand refers to these as "i^oorf sub- 
stitutions" . One could hope to relax these hypotheses and still get the same 
result with full generality [IQ . Up to now there is no proof of a Cobham-like 
theorem for a substitution having no main sub-substitution having the same 
dominating eigenvalue like a i— >■ aaO, i— >■ 01 and 1 i-> 0. In this latter ex- 
ample, the dominating eigenvalue is 2 but the substitution restricted to the 
alphabet {0, 1} has (1 + V5)/2 as dominating eigenvalue. 

— Come back again to Cobham's theorem but this time for Gaussian integers 
G = {a + ib \ a,b € Z}. Indeed, these numbers have nice representations 
using the so-called canonical number systems [68]. For canonical numera- 
tion systems in algebraic number fields, every integer has a unique finite 
expansion which is computed starting with the least significant digit first. A 
Cobham-like conjecture for Gaussian integers [HS] is related to the famous 
Four Exponentials conjecture: let {Ai, A2} and {xi, X2} be two pairs of ratio- 
nally independent complex numbers. Then, one of the numbers e^^^^, e^^'^^ , 
g^2Xi ^ gA2X2 transcendental, for instance see |91) . 

— The philosophy of Cobham's theorem also appears when considering self- 
generating sets as introduced by Kimberling 69 . For instance, consider the 
afltine maps f: N^N,xi-^2x + l and g : N ^ 'N,x 1-^ ix + 2. A self- 
generating set obtained from / and g can be defined as the smallest subset 
5 of N containing and such that f{S) C S and g{S) C S. In our example, 
the first few elements in S are 

0, 1, 2, 3, 5, 6, 7, 10, 11, 13, 14, 15, 21, 23, 26, 27, 29, 30, 31, 42, 43, 47, 53, ... . 

One can therefore study the fc-recognizability of S. If one considers maps 
where the multiplicative constants are multiplicatively independent, then 
Allouche, Shallit and Skordev conjectured that the corresponding set cannot 
be fc- recognizable [10 . With some technical hypothesis about the multiplica- 
tive coefficients when there are at least three affine maps, this conjecture has 
been proved to be true in [ 67j . One could hope to prove this conjecture in full 
generality. A possible connection with smooth numbers (having only small 
prime factors in their decomposition) has been pointed out by J. Shallit. 

— In combinatorial game theory the Sprague- Grundy function g is of great 
interest. For instance, the positions for which g vanishes are exactly the V- 
position of the game and when considering sums of games (several games are 
played simultaneously and at each turn, the player chooses on which of those 
games he will made a move) , it can be used to distinguish A/'-positions (TB] . 
For Wythoff's game, little is known about this function (see for instance 
[5 5) ) even if its recursive definition is simple. The value of g{x,y) is the 
minimum excluded value (Mex) of the set of g{u, v) where (u, v) is ranging 



amongst the options reachable from {x,y). By definition Mex0 = and 
MexS* = min(N\ 5) for all finite set S. 



012345678 9 





1 
2 
3 

4 
5 



012345678 9 
120453 78 6 10 
201534867 11 
345620 1 9 10 12 
453276 9 1 8 
534068 10 1 2 7 



Let F be the Fibonacci numeration basis. As suggested by the developments 
considered in |41| could the above infinite array reveal some morphic struc- 
ture, like having a finite i^-kernel where this set could be defined as the set 
of subarrays 



for given suffixes u, v7 For the generalization of fc-kernel, see for instance 



Theorem [10] is a Cobham-like theorem for sets of real numbers, definabil- 
ity in (M, Z, +,0, <) being the counterpart to ultimate periodicity of a set 
of integers. Can a simpler proof of this result be achieved, for instance by 
considering the techniques developed in [T]? Also could this result be ex- 
tended to other kind of representations of real numbers. For instance, con- 
sidering /3-expansions of real numbers, we could define /3-recognizable sets 
of real numbers and consider two multiplicatively independent real numbers 
a, 13 > 1. As a first step (and to mimic what has chronologically been done 
for sets of integers) , one could consider a set of real numbers X C K which 
is both /c-recognizable and /3-recognizable by two weak deterministic RVA, 
with fc > 2 an integer and /? a Pisot number like the Golden ratio, and ask 
is X definable in (R, Z, +, 0, <)? 

About abstract numeration systems, several questions about S'-recognizable 
sets are open. For instance, is there some reasonable logical characterization 
of the iS-recognizable sets of integers which could be compared to the char- 
acterization in the extended Presburger arithmetic (N, -|-,Vfe). But one can 
notice that in general, if X and Y are S'-recognizable, there is no reason to 
have a S'-recognizable set X + Y (even when considering multiplication by a 
constant). Another question is to relate the growth function n i— >■ ^{LdA") 
of the regular language L on which the abstract numeration system S is 
based and the S'-recognizable set. For instance, if P G N[X] is a polynomial 
such that P{N) is S-recognizable, what can be said about the growth func- 
tion of the language of numeration. Results like the one found in [M] could 
be of interest. 

Recently numeration systems based on the powers of a rational number have 
been introduced [6] (motivated by a number theoretic question from Mahler). 



(5(2;, y)) 



rcpp{x)£{0,l}*u, i-cpp{y)e{0,l}'v 
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These numerations reveal interesting and intriguing properties. For instance, 
little is known about the properties of the language of numeration For 
a given prefix w, compute the number of words of length n in the quotient 

— It is well-known since the work of Cobham j35) that a morphic infinite word 
w = T{a'^{a)) where a and r are arbitrary morphisms (where both mor- 
phisms can be erasing and r is not necessarily a coding) can be generated by 
a non-erasing morphism /i and a coding v. See for instance |12| for a compre- 
hensive proof or [66_ for an alternative presentation. All the known proofs 
rely on morphisms and are quite long: could one describe in the formalism 
of automata theory a somehow simpler proof? 

— Let me also mention Hollander's conjecture when for a linear numeration 
basis [/, the dominant root condition is not satisfied [65^. He has conjectured 
that rep[j(N) can be regular only if there exists n such that 

lim Ujn+k/Uu^Un+k 

exists and is independent of k, and the characteristic polynomial p{X) of 
U is such that p{X) = q{X^) where q{X) is the minimal polynomial for a 
recurrence which gives a regular language [26j . 

— Let p be a prime. Derksen proved that the zero set of a linear recurrence 
over a field of characteristic p is p-automatic |40|2j . Could such a result and 
Cobham's theorem be used to get back the classical Skolem-Mahler-Lech 
theorem (the zero set of a linear recurrence over a field of characteristic is 
ultimately periodic)? 

— The reader fond of logic could also look back at the list of open problems 
given by Michaux and Villemaire [75] . This survey paper is devoted to prob- 
lems related to Biichi's characterization of sets of natural numbers recogniz- 
able by finite automata in base k, as well as to Cobham's and Semenov's 
extensions of it. 
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